Temporal Segmentation of Video by Hierarchical Mean Shift Analysis
نویسنده
چکیده
We describe a simple new technique for spatio temporal segmentation of video sequences Each pixel of a D space time video stack is mapped to a D feature point whose coordinates include three color components two motion angle components and two motion position components The clustering of these feature points provides color segmentation and motion segmentation as well as a consistent labeling of regions over time which amounts to region tracking For this task we have adopted a hierarchical clustering method which operates by repeatedly applying mean shift analysis over increasingly large ranges using at each pass the cluster centers of the previous pass with weights equal to the counts of the points that contributed to the clusters This technique has lower complexity for large mean shift radii than ordinary mean shift analysis because it can use binary tree structures more e ciently during range search In addition it provides a hierarchical segmentation of the data Applications include video compression and compact descriptions of video sequences for video indexing and retrieval applications The support of NSF grants EAR and IIS is gratefully acknowledged Introduction and Related Work One of the goals of video analysis is to nd out as much as possible about what is going on in the scene from what was captured by the video Finding out what is going on is more formally called semantic interpretation To interpret a scene one rst needs to label independent objects Boundaries of objects typically correspond to boundaries of color patches in the video but situations where a foreground object is in front of a background of similar color can also occur When the camera translates or when objects move independently boundaries of objects correspond to boundaries across which the optical ow changes in the video and the patches inside these image boundaries display some consistency of optical ow but optical ow is notoriously unreliable at object boundaries To overcome these limitations of motion segmentation and color segmentation and to maximize the chances of correctly extracting objects researchers have been combining motion and color cues in various ways The task of dividing video frames into patches that may correspond to objects in the scene is called object based segmentation layer extraction sprite representation or space time segmentation there are arguably subtle di erences between these concepts Color patches produce generalized cylinders in the D spatio temporal pixel volume obtained by piling up frames into a D video stack These space time entities have been called color ows action cylinders and feature trajectories in the literature We refer to them as video strands Our goal in this paper is to extract these strands and characterize them by color average radius and axis position and orientation There are three main strategies for space time segmentation of video sequences Find spatial regions by segmenting each frame then track these regions from frame to frame Track interest points to nd their trajectories then bundle these trajectories Perform D segmentation of the video stack The rst strategy attempts to discover spatial structures and extend them in the temporal dimension the second strategy discovers temporal structures and groups them in the spatial di mension and the third strategy treats the spatial and temporal dimensions equally The strategies are illustrated in Fig In this gure the horizontal dimension represents the sizes of the struc tures in the spatial dimension and the vertical dimension represents their sizes in the temporal dimension The segmentation starts with features that are small both spatially and temporally near the lower left of the gure The goal of spatio temporal segmentation is to group these features into homogeneous structures in the D video stack that are large along both the spatial and temporal dimensions and are therefore at the upper right of the gure The three strate gies correspond to three paths for growing such structures horizontally then vertically vertically then horizontally diagonally Our approach belongs to the third category In the rst category of spatio temporal methods frame by frame tracking there are more vari ants than we can adequately review here Typically the frames are segmented one after the other using motion information Regions of the previous frame are generally shifted and projected into the current frame by motion compensation and these projections are compared to the regions of the current frame in various ways to enforce temporal coherence between spatial regions Another subcategory segments all the frames spatially in a rst step then 2D+t spatiotemporal segmentation of image sequence te m po ra l p ro pa ga tio n or r eg io n m at ch in g motion grouping spatial structure region segmentation approach in st an ta ne ou s lo ng −t er m te m p o ra l s tr u ct u re
منابع مشابه
Spatio-temporal Segmentation of Video by Hierarchical Mean Shift Analysis
We describe a simple new technique for spatio-temporal segmentation of video sequences. Each pixel of a 3D space-time video stack is mapped to a 7D feature point whose coordinates include three color components, two motion angle components and two motion position components. The clustering of these feature points provides color segmentation and motion segmentation, as well as a consistent label...
متن کاملTraffic Scene Analysis using Hierarchical Sparse Topical Coding
Analyzing motion patterns in traffic videos can be exploited directly to generate high-level descriptions of the video contents. Such descriptions may further be employed in different traffic applications such as traffic phase detection and abnormal event detection. One of the most recent and successful unsupervised methods for complex traffic scene analysis is based on topic models. In this pa...
متن کاملNon-rigid Objects Detection and Segmentation in Video Sequence Using 3d Mean Shift Analysis
A new technique for robust detection and segmentation of non-rigid objects in video sequence is proposed in this paper. In our approach, spatio-temporal mean shift analysis (MSA) is employed to convert raw video/object data to their corresponding 3D/2D region feature spaces (RFS) respectively. The distance metric of RFS can be defined based on its spatio-temporal continuous property. Within the...
متن کاملAn Efficient Hierarchical Modulation based Orthogonal Frequency Division Multiplexing Transmission Scheme for Digital Video Broadcasting
Due to the increase of users the efficient usage of spectrum plays an important role in digital terrestrial television networks. In digital video broadcasting, local and global content are transmitted by single frequency network and multifrequency network respectively. Multifrequency network support transmission of global content and it consumes large spectrum. Similarly local content are well ...
متن کاملVideo Segmentation Using Iterated Graph Cuts Based on Spatio-temporal Volumes
We present a novel approach to segmenting video using iterated graph cuts based on spatio-temporal volumes. We use the mean shift clustering algorithm to build the spatio-temporal volumes with different bandwidths from the input video. We compute the prior probability obtained by the likelihood from a color histogram and a distance transform using the segmentation results from graph cuts in the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002